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Abstract — Efficient optimal prefix coding lias long been ac- 
complished via the Huffman algorithm. However, there is still 
room for improvement and exploration regarding variants of the 
Huffman problem. Length-limited Huffman coding, useful for 
many practical applications, is one such variant, for which codes 
are restricted to the set of codes in which none of the n codewords 
is longer than a given length, /max- Binary length-limited coding 
can be done in O(nimax) time and 0{n) space via the widely 
used Package-Merge algorithm and with even smaller asymptotic 
complexity using a lesser-known algorithm. In this paper these 
algorithms are generalized without increasing complexity in 
order to introduce a minimum codeword length constraint Imin, 
to allow for objective functions other than the minimization of 
expected codeword length, and to be applicable to both binary 
and nonbinary codes; nonbinary codes were previously addressed 
using a slower dynamic programming approach. These extensions 
have various applications — including fast decompression and 
a modified version of the game "Twenty Questions" — and can 
be used to solve the problem of finding an optimal code with 
limited fringe, that is, finding the best code among codes with a 
maximum difference between the longest and shortest codewords. 
The previously proposed method for solving this problem was 
nonpolynomial time, whereas solving this using the novel linear- 
space algorithm requires only 0(n(Zmax — ^min)^) time, or even 

less if imax - /mill IS HOt 0(log7l). 

I. Introduction 

The parlor game best known as "Twenty Questions" has 
a long history and a broad appeal. It was used to advance 
the plot of Charles Dickens' A Christmas Carol [10], in 
which it is called "Yes and No," and it was used to explain 
information theory in Alfred Renyi's A Diary on Information 
Theory [38], in which it is called "Bar-kochba." The two- 
person game begins with an answerer thinking up an object 
and then being asked a series of questions about the object by 
a questioner These questions must be answered either "yes" or 
"no." Usually the questioner can ask at most twenty questions, 
and the winner is determined by whether or not the questioner 
can sufficiently surmise the object from these questions. 

Many variants of the game exist — both in name and 
in rules. A recent popular variant replaces the questioner 
with an electronic device [5]. The answerer can answer the 
device's questions with one of four answers — "yes," "no," 
"sometimes," and "unknown." The game also differs from the 
traditional game in that the device often needs to ask more 
than twenty questions. If the device needs to ask more than 



the customary twenty questions, the answerer can view this as 
a partial victory, since the device has not answered correctly 
given the initial twenty. However, the device eventually gives 
up after 25 questions if it cannot guess the questioner's object. 

Consider a short example of such a series of questions, with 
only "yes," "no," and "sometimes" as possible answers. The 
object to guess is one of the seven Newtonian colors [37], 
which we choose to enumerate as follows: 

1) Green (G) 

2) Yellow (Y) 

3) Red (R) 

4) Orange (O) 

5) Indigo© 

6) Violet (V) 

7) Blue (B). 

A first question we ask might be, "Is the color seen as a warm 
color?" If the answer is "sometimes," the color is green. If it 
is "yes," it is one of colors 2 through 4. If so, we then ask, "Is 
the color considered primary?" "Sometimes" implies yellow, 
"yes" implies red, and "no" implies orange. If the color is not 
warm, it is one of colors 5 through 7, and we ask whether 
the color is considered purple, a different question than the 
one for colors 2 through 4. "Sometimes" implies indigo, "yes" 
implies violet, and "no" implies blue. Thus we can distinguish 
the seven colors with an average of 2 — pi questions if pi is 
the probability that color in question is green. 

This series of questions is expressible using code tree 
notation, e.g., [40], in which a tree is formed with each child 
split from its parent according to the corresponding output 
symbol, i.e., the answer of the corresponding question. A 
code tree corresponding to the above series of questions is 
shown in Fig. [T] where a left branch means "sometimes," a 
middle branch means "yes," and a right branch means "no." 
The number of answers possible is referred to by the constant 
D and the tree is a D-ary tree. In this case, D = i and the 
code tree is ternary. The number of outputs, n = 7, is the 
number of colors. 

The analogous problem in prefix coding is as follows: A 
source (the answerer) emits input symbols (objects) drawn 
from the alphabet X = {1, 2, . . . , ti}, where n is an integer 
Symbol i has probability pi, thus defining probability vector 
p = {pi,p2, ■ ■ ■ ,Pn)- Only possible symbols are considered 



for coding and these are sorted in decreasing order of probabil- 
ity; thus Pi > and pi < pj for every i > j such that i,j € X. 
(Since sorting is only 0(n log n) time and 0{n) space, this 
can be assumed without loss of generality.) Each input symbol 
is encoded into a codeword composed of output symbols of the 
D-ary alphabet {0, 1, . . . , Z? — 1}. (In the example of colors, 
represents "sometimes," 1 "yes," and 2 "no") The codeword 
Ci corresponding to input symbol i has length k, thus defining 
length vector I = {li,l2, ■ ■ ■ , In)- In Fig- [U for example, cj 
is 223 — the codeword corresponding to blue — so length 
Ij — 2. The overall code should be a prefix code, that is, 
no codeword should begin with the entirety of another 
codeword Cj . In the game, equivalently, we should know when 
to end the questioning, this being the point at which we know 
the answer. 

For the variant introduced here, all codewords must have 
lengths lying in a given interval [I mm, I max]- In the example 
of the device mentioned above, Imm — 20 and Imax ~ 25. 
A more practical variant is the problem of designing a data 
codec which is efficient in terms of both compression ratio 
and coding speed. Moffat and Turpin proposed a variety of 
efficient implementations of prefix encoding and decoding in 
[35], each involving table lookups rather than code trees. They 
noted that the length of the longest codeword should be limited 
for computational efficiency's sake. Computational efficiency 
is also improved by restricting the overall range of codeword 
lengths, reducing the size of the tables and the expected time 
of searches required for decoding. Thus, one might wish to 
have a minimum codeword size of, say, Imm — 16 bits and 
a maximum codeword size of Imax ~ 32 bits (D — 2). If 
expected codeword length for an optimal code found under 
these restrictions is too long, Imm can be reduced and the 
algorithm rerun until the proper trade-off between coding 
speed and compression ratio is found. 

A similar problem is one of determining opcodes of a 
microprocessor designed to use variable-length opcodes, each 
a certain number of bytes (D — 256) with a lower limit and 
an upper limit to size, e.g., a restriction to opcodes being 
16, 24, or 32 bits long (Zmin = 2, ^max ~ 4). This problem 
clearly falls within the context considered here, as does the 
problem of assigning video recorder scheduling codes; these 
human-readable decimal codes {D = 10) have lower and 
upper bounds on their size, such as Imm ~ 3 and Imax — 8, 
respectively. 

Other problems of interest have Imm — and are thus 
length limited but have no practical lower bound on length 
[45, p. 396]. Yet other problems have not fixed bounds but a 
constraint on the difference between minimum and maximum 
codeword length, a quantity referred to as fringe [1, p. 121]. 
As previously noted, large fringe has a negative effect of the 
speed of a decoder In Section |IX] of this paper we discuss 
how to find such codes. 

Note that a problem of size n is trivial for certain values 
of Imin and ^inax- If ^min > logu f^, then all codewords can 
have /,„in output symbols, which, by any reasonable objective, 
forms an optimal code. If Imax < log/j n, then we cannot 



code all input symbols and the problem, as presented here, 
has no solution. Since only other values are interesting, we 
can assume that n G ^ D'^iixj poi example, for the 

modified form of Twenty Questions, D — i, Imin — 20, and 
^max = 25, so we are only interested in problems where 
n e (2''°, 2^"]. Since most instances of Twenty Questions 
have fewer possible outcomes, this is usually not an interesting 
problem after all, as instructive as it is. In fact, the fallibility 
of the answerer and ambiguity of the questioner mean that 
a decision tree model is not, strictly speaking, correct. For 
example, the aforementioned answers to questions about the 
seven colors are debatable. The other applications of length- 
bounded prefix coding mentioned previously, however, do fall 
within this model. 

If we either do not require a minimum or do not require a 
maximum, it is easy to find values of Imm or Imax which do not 
limit the problem. As mentioned, setting Imm ~ results in a 
trivial minimum, as does /i„in = 1. Similarly, setting Imax — n 
or using the hard upper bound imax = \{'^^^) / {D—iy] results 
in a trivial maximum value. In the case of trivial maximum 
values, one can actually minimize expected codeword length 
in linear time given sorted inputs. This is possible because, at 
each stage in the standard Huffman coding algorithm, the set 
of Huffman trees is an optimal /oreif (set of trees) [20]. We 
describe the linear-time algorithm in Section IVIII 

If both minimum and maximum values are trivial, Huffman 
coding [21] yields a prefix code minimizing expected code- 
word length 

n 

Y.P^h■ (1) 

i=l 

The conditions necessary and sufficient for the existence of 
a prefix code with length vector I are the integer constraint, 
Zi e Z+, and the Kraft (McMillan) inequality [34], 

n 

^ ^Z)-'- < 1. (2) 

i=l 

Finding values for I is sufficient to find a corresponding code, 
as a code tree with the optimal length vector can be built from 
sorted codeword lengths in 0{n) time and space. 

It is not always obvious that we should minimize the 
expected number of questions '}2nPih (or, equivalently, the 
expected number of questions in excess of the first ^min^ 

n 

Y,P^il^-lmm)^ (3) 

1=1 

where 2:+ is a; if a; is positive, otherwise). Consider the 
example of video recorder scheduling codes. In such an 
application, one might instead want to minimize mean square 
distance from ^min, 

n 

^ ^min) ■ 

i=l 




= 



Fig. 1. A monotonic code tree for n = 7 and D = 3 with I = (1, 2, 2, 2, 2, 2, 2): Eacli leaf contains tlie trinary output code, tlie corresponding object 
number, and tiie initial for the corresponding color as in Section U The cv^'s are as defined in Section IVIIII 



We generalize and investigate how to minimize the value 

n 

^PifHi - Imin) (4) 
i=l 

under the above constraints for any penalty function ip{-) 
convex and increasing on R+. Such an additive measurement 
of cost is called a quasiarithmetic penalty, in this case a convex 
quasiarithmetic penalty. 

One such function if is Lp{5) ~ (<^+^min)'^, a quadratic value 
useful in optimizing a communications delay problem [29]. 
Another function, Lp[5) = for i > 0, can be used 

to minimize the probability of buffer overflow in a queueing 
system [22]. 

Mathematically stating the length-bounded problem, 



Given 



Minimize 
subject to 



{1} 



P = (Pi, ■ ■ ■ ,Pn), Pi > 0; 
De{2,3,...}; 

convex, monotonically increasing 

: M+ ^ M+ 

J2iPiV{h - Imin) 

^ {^min; ^min ■ ■ ■ 7 ^max}- 



Note that we need not assume that probabilities pi sum to 1; 
they could instead be arbitrary positive weights. 

Thus, in this paper, given a finite n-symbol input alphabet 
with an associated probability vector p, a D-symbol output 
alphabet with codewords of lengths [li-ain , ^max] allowed, and 
a constant-time-calculable convex penalty function (p, we 
describe an 0(n(^,nax — ^min))-time 0(n)-space algorithm for 
constructing a (^-optimal code, and sketch an even less com- 
plex reduction for the most convex penalty function, if (6) — 
5, minimization of expected codeword length. In the next 
section, we present a brief review of the relevant literature. 
In Section [nil we extend to D-ary codes an alternative to 
code tree notation first presented in [29]. This notation aids 
in solving the problem in question by reformulating it as an 
instance of the D-ary Coin Collector's problem, presented in 
Section |IV] as an extension of the (binary) Coin Collector's 
problem [30]. An extension of the Package-Merge algorithm 
solves this problem; we introduce the reduction and resulting 
algorithm in Section [V] We make it 0{n) space in Section [VTl 



and refine it in Section IVIII The alternative approach for the 
expected length problem of minimizing ([T]) — i.e., f{S) = 5 
— is often faster; this approach is sketched in Section IVIIII 
Algorithmic modifications, applications, possible extensions of 
this work are discussed in Section HXl 

II. Prior Work 

Reviewing how the problem in question differs from binary 
Huffman coding: 

1) It can be nonbinary, a case considered by Huffman in 
his original paper [21]; 

2) There is a maximum codeword length, a restriction 
previously considered, e.g., [23] in 0(n'^Zmax logi?) 
time [15] and 0{n^ log-D) space, but solved efficiently 
only for binary coding, e.g., [30] in O(nZmax) time 0{n) 
space and most efficiently in [39]; 

3) There is a minimum codeword length, a novel restric- 
tion; 

4) The penalty can be nonlinear, a modification previously 
considered, but only for binary coding, e.g., [3]. 

There are several methods for finding optimal codes for 
various constraints and various types of optimality; we review 
the three most common families of methods here. Note that 
other methods fall outside of these families, such as a linear- 
time method for finding minimum expected length codewords 
for a uniform distribution with a given fringe [9]. (This differs 
from the limited-fringe problem of Section IIXI in which the 
distribution need not be uniform and fringe is upper-bounded, 
not fixed.) 

The first and computationally simplest of these are 
Huffman-like methods, originating with Huffman in 1952 [21] 
and discussed in, e.g., [7]. Such algorithms are generally 
linear time given sorted weights and thus 0(n log n) time in 
general. These are useful for a variety of problems involving 
penalties in linear, exponential, or minimax form, but not 
for other nonlinearities nor for length-limited coding. More 
complex variants of this algorithm are used to find optimal 
alphabetic codes, that is, codes with codewords constrained 
to be in a given lexicographical order. These variants are in 
the Hu-Tucker family of algorithms [13], [16], [20], which, at 
0{n\ogn) time and 0{n) space [27], are the most efficient 



algorithms known for solving such problems (although some 
instances can be solved in linear time [17], [18]). 

The second type of method, dynamic programming, is also 
conceptually simple but much more computationally com- 
plex. Gilbert and Moore proposed a dynamic programming 
approach in 1959 for finding optimal alphabetic codes, and, 
unhke the Hu-Tucker algorithm, this approach is readily 
extensible to search trees [26]. Such an approach can also 
solve the nonalphabetic problem as a special case, e.g., [12], 
[19], [23], since any probability vector satisfying pi < pj for 
every i > j has an optimal alphabetic code that optimizes 
the nonalphabetic case. A different dynamic programming 
approach can be used to find optimal "T'-ended codes [6] 
and optimal codes with unequal letters costs [14]. Itai [23] 
used dynamic programming to optimize a large variety of 
coding and search tree problems, including nonbinary length- 
limited coding, which is done with Oin^lma^logD) time and 
0{n^ log-D) space by a reduction to the alphabetic case. We 
reduce complexity significantly in this paper. 

The third family is that of Package-Merge-based algorithms, 
and this is the type of approach we use for the generalized 
algorithm considered here. Introduced in 1990 by Larmore and 
Hirschberg [30], this approach is most often used for binary 
length-limited linear-penalty Huffman coding, although it has 
been extended for application to binary alphabetic codes [32] 
and to binary convex quasiarithmetic penalty functions [3]. 
The algorithms in this approach generally have 0(riZniax)-time 
0(n) -space complexity, although space complexity can vary 
by application and implementation, and the alphabetic variant 
and some nonquasiarithmetic (and thus nonlinear) variants 
have slightly higher time complexity (0(n/niax logn)). 

To use this approach for nonbinary coding with a lower 
bound on codeword length, we need to alter the approach, 
generalizing to the problem of interest. The minimum size 
constraint on codeword length requires a relatively simple 
change of solution range. The nonbinary coding generalization 
is a bit more involved; it requires first modifying the Package- 
Merge algorithm to allow for an arbitrary numerical base 
(binary, ternary, etc.), then modifying the coding problem to 
allow for a provable reduction to the modified Package-Merge 
algorithm. At times "dummy" inputs are added in order to 
assist in finding an optimal nonbinary code. In order to make 
the algorithm precise, the 0(n(/max — ^inin))-time 0(n)-space 
algorithm, unlike some other implementations [30], minimizes 
height (that is, maximum codeword length) among optimal 
codes (if multiple optimal codes exist). 

III. NODESET Notation 

Before presenting an algorithm for optimizing the above 
problem, we introduce a notation for codes that generalizes 
one first presented in [29] and modified in [3]. Nodeset 
notation, an alternative to code tree notation, has previously 
been used for binary alphabets, but not for general D-aiy 
alphabet coding, thus the need for generalization. 

The key idea: Each node represents both the share of 
the penalty ^ (weight) and the (scaled) share of the Kraft 



sum ^ (width) assumed for the Ith bit of the ith codeword. 
By showing that total weight is an increasing function of the 
penalty and that there is a one-to-one correspondence between 
an optimal code and a corresponding optimal nodeset, we 
reduce the problem to an efficiently solvable problem, the Coin 
Collector's problem. 

In order to do this, we first need to make a modification to 
the problem analogous to one Huffman made in his original 
nonbinary solution. We must in some cases add a "dummy" 
input or "dummy" inputs of infinitesimal probability pi — e > 

to the probability vector to assure that the optimal code 
has the Kraft inequality satisfied with equality, an assumption 
underlying both the Huffman algorithm and ours. The positive 
probabilities of these dummy inputs mean that codes obtained 
could be slightly suboptimal, but we later specify an algorithm 
where e = 0, obviating this concern. 

As with traditional Huffman coding [21], the number of 
dummy values needed is {D — n) mod {D — 1), where 

X mod y ^ X — y[x/y\ 

for all integers x (not just nonnegative integers). Such dummy 
inputs allow us to assume that the optimal tree (for real plus 
dummy items) is an optimal full tree (i.e., that k{1) — 1, 
where k is as defined in (|2]i). For sufficiently small e, the 
code will be identical to that for e = 0, and, as in traditional 
Huffman coding, nondummy codewords are identical to the 
codewords of an optimal code for the original input distribu- 
tion. We can thus assume for our algorithm that k{1) = 1 and 
n mod {D-l) = 1. 

With this we now present nodeset notation: 
Definition 1: A node is an ordered pair of integers 
such that i e {1, . . . , n} and I e {^min + !,•■■, ^max}- Call 
the set of all possible nodes /. This set can be arranged in an 
nx (/max— 'mill) grid, e.g., Fig.|2] The set of nodes, or nodeset, 
corresponding to input symbol i (assigned codeword Ci with 
length li) is the set of the first li — Zmin nodes of column i, 
that is, r]i{i) = {{j, I) \ j ^ i, I e {'min + 1, • ■ • , k}}- The 
nodeset corresponding to length vector I is rj{l) = Ui*?'!*); 
this corresponds to a set of n codewords, a code. Thus, in 
Fig. |2l the dashed line surrounds a nodeset corresponding to 

1 = (1, 2, 2, 2, 2, 2, 2). We say a node (i, I) has width p{i, I) = 

and weight = Piip{l - /min) - Pt^il ~ Imin - 1), 

as shown in the example in Fig. |2] Note that if ip{l) = /, 

We must emphasize that the above "nodes" are unlike nodes 
in a graph; similar structures are sometimes instead called tiles 
[31], but we retain the original, more prevalent term "nodes." 
Given valid nodeset C /, it is straightforward to find 
the corresponding length vector and, if it satisfies the Kraft 
inequality, a code. 

IV. The D-ary Coin Collector's Problem and the 
Package-Merge Algorithm 

We find optimal codes by first solving a related problem, the 
Coin Collector's problem. Let denote the set of all integer 
powers of a fixed integer D > 1. The Coin Collector's problem 



1 2 3 4 5 6 7 

i (input symbol) 

Fig. 2. The set of nodes / with widths {p(i, I)} and weights {tJ.{i, I)} for </p(5) = 5^, n = 7, D = 3, Imin = 1> 



of size m considers "coins" indexed by i e {1,2, ... ,m}. 
Each coin has a width, pi e D^; one can think of width as 
coin face value, e.g., pi — 0.25 — for a quarter dollar 
(25 cents). Each coin also has a weight, pi e E. The final 
problem parameter is total width, denoted ptot- The problem 
is then: 



Minimize {bc{i, 
subject to 
where 



.,m}} 



m G Z+ 
PiGR 

Ptot e K+. 



(5) 



We thus wish to choose coins with total width ptot such that 
their total weight is as small as possible. This problem is 
an input-restricted variant of the knapsack problem. However, 
given sorted inputs, a linear-time solution to (|5]l for D ^ 2 
was proposed in [30]. The algorithm in question is called the 
Package-Merge algorithm and we extend it here to arbitrary D. 

In our notation, we use i e {!,..., to} to denote both the 
index of a coin and the coin itself, and T to represent the 
TO items along with their weights and widths. The optimal 
solution, a function of total width ptot and items T, is denoted 
CC(X, Ptot) (the optimal coin collection for T and ptot)- Note 
that, due to ties, this need not be unique, but we assume that 
one of the optimal solutions is chosen; at the end of SectionFVll 
we discuss how to break ties. 

Because we only consider cases in which a solution exists, 
Ptot = i^Ppow for some ppow & and cj e Z+. Here, 
assuming ptot > 0, Ppow and ui are the unique pair of a power 
of D and an integer that is not a multiple of D, respectively, 
which, multiplied, form ptot- If Ptot = 0, w and ppow are not 
used. Note that ppow need not be an integer 
Algorithm variables 

At any point in the algorithm, given nontrivial T and ptot, we 
use the following definitions: 



Remainder 

Ppow 

Minimum width 

P* 

Small width set 

I* 

"First" item 

i* 

"First" package 



= the unique x e D 

such that ^ e Z\DZ 

= miuiei Pi 

(note p* e D^) 

^ {i\ Pt= P*} 
(note I* ^ 0) 

- arg min,gx*M» 



V such that 

\V\^D, 
V* = { VCI*, 

V ^I*\V, 



\J* 



> D 
< D 



where DZ denotes integer multiples of D and V ^ T*\V 
denotes that, for alH e T-" and j E I*\V, pi < Pj. Then the 
following is a recursive description of the algorithm: 

Recursive D-ary Package-Merge Procedure 

Basis. Ptot = 0: CC(X, ptot) = 0- 

Case 1. p* = Ppow and X 7^ 0: CC(J, ptot) = 

cc(2:\{i*},ptot-p*)u{z*}. 

Case 2a. p* < Ppow> I and \T* \ < D: CC(X, ptot) = 
CC(X\X*,ptot). 

Case 2b. p* < ppow, 2^ ^ 0, and > D: Create i', a 
new item with weight p^/ = X^ieP* 1^^ width pi' = Dp* . 
This new item is thus a combined item, or package, formed 
by combining the D least weighted items of width p*. Let 
S ~ GC{I\V* U {i'}, Ptot) (the optimization of the packaged 
version). If i' e S, then CC(X, ptot) = S\{i'}\JV*; otherwise, 
CC(X,ptot)=5. 

Theorem 1: If an optimal solution to the Coin Collector's 
problem exists, the above recursive (Package-Merge) algo- 
rithm will terminate with an optimal solution. 



p = 3 
^ = 5 



p = 1 p = 1 p = l P=l P = l 
^ = 4 /J. = 2 At=l /'=1 



^tot = 5 = 123 



p = 3 

fi = b 



p = 3 
^. = 7 



p = 3 

M = 7 



ftot = 3 = 10,, 





= 7 












= 1 = 1 









Pto, = = Os 



Fig. 3. A simple example of the Package-Merge algorithm 



Proof: We show that the Package-Merge algorithm 
produces an optimal solution via induction on the number of 
input items. The basis is trivially correct, and each inductive 
case reduces the number of items by at least one. The inductive 
hypothesis on ptot > and X 7^ is that the algorithm is 
correct for any problem instance with fewer input items than 
instance (X, ptot)- 

If p* > Ppow > 0, or if X = and ptot 7^ 0, then there is 
no solution to the problem, contrary to our assumption. Thus 
all feasible cases are covered by those given in the procedure. 
Case 1 indicates that the solution must contain at least one 
element (item or package) of width p* . These must include 
the minimum weight item in X*, since otherwise we could 
substitute one of the items with this "first" item and achieve 
improvement. Case 2 indicates that the solution must contain 
a number of elements of width p* that is a multiple of D. If 
this number is 0, none of the items in V* are in the solution. If 
it is not, then they all are. Thus, if V* — 0, the number is 0, 
and we have Case 2a. If not, we may "package" the items, 
considering the replaced package as one item, as in Case 2b. 
Thus the inductive hypothesis holds. ■ 

Fig. [3] presents a simple example of this algorithm at work 
for D ^ 3, finding minimum total weight items of total 
width ptot = 5 (or, in ternary, I23). In the figure, item width 
represents numeric width and item area represents numeric 
weight. Initially, as shown in the top row, the minimum weight 



item has width p* 



PpQ^ ~ 1. This item is put into 



the solution set, and the next step repeats the task on the 
items remaining outside the solution set. Then, the remaining 
minimum width items are packaged into a merged item of 
width 3 (IO3), as in the middle row. Finally, the minimum 
weight item/package with width p* — pi.> — ppo^j = 3 is 
added to complete the solution set, which is now of weight 7. 



The remaining packaged item is left out in this case; when 
the algorithm is used for coding, several items are usually left 
out of the optimal set. Given input sorted first by width then 
weight, the resulting algorithm is 0(m) time and space. 

V. A General Algorithm 

We now formalize the reduction from the coding problem 
to the Coin Collector's problem. This generalizes the similar 
reduction shown in [3] for binary codes with only a limit on 
maximum length, which is in turn a generalization of [30] 
for length-limited binary codes with linear Lp, the traditional 
penalty function. 

We assert that any optimal solution N of the Coin Collec- 
tor's problem with total width 



Ptot — 



D - 1 



-D 



-in 



on coins X (identical to the set of all possible nodes /) is a 
nodeset for an optimal solution of the coding problem. This 
yields a suitable method for solving the problem. 

To show this reduction, we first define p{N) in a natural 
manner for any N = rj{l): 

p{N) ^ ^(^'0 

n li 

= E E 

i=i /=/„,i„+i 



D - 1 
» - k{1) 



D-l 



where k{1) is the Kraft sum (O. Given n mod {D — 1) = 1, 
all optimal codes have the Kraft inequality satisfied with 
equality; otherwise, the longest codeword length could be 
shortened by one, strictly decreasing the penalty without 
violating the inequality. Thus the optimal solution has k{1) = 1 
and 

piN) 



D - 1 



-D 



Also define: 



^(0 = 



6. 



)GAf 



Note that 



ii,l)i£N 
n li 

i=i /=;„,i„+i 

n n 

= y^^pi(p{ii - ^inin) - y^p»y(o). 

1=1 i=l 

Since the subtracted term is a constant, if the optimal nodeset 
corresponds to a valid code, solving the Coin Collector's 
problem solves this coding problem. To prove the reduction, 
we need to prove that the optimal nodeset indeed corresponds 
to a valid code. We begin with the following lemma: 

Lemma 1: Suppose that iV is a nodeset of width xD~^ + r 
where k and x are integers and Q < r < D^^ . Then N has a 
subset R with width r. 

Proof: Let us use induction on the cardinality of the set. 
The base case |7V| = 1 is trivial since then a; = 0. Assume 
the lemma holds for all |iV| < n, and suppose |iV| — n. 
Let p* = Tom-^fj pj and j* — arg min ^^pj. We can view 



item j* of width p* E D as the smallest contributor to the 
width of N and r as the portion of the D-ary expansion of the 
width of N to the right of D~''. Then r must be an integer 
multiple of p* . If r ~ p*, R = {j*} is a solution. Otherwise 
let N' = N\{j*} (so |iV'| = n - 1) and let R' be the subset 
obtained from solving the lemma for set N' of width r — p* . 
Then R = R'U{j*}. ■ 
We now prove the reduction: 

Theorem 2: Any N that is a solution of the Coin Collector's 
problem for 

PiN) ^ 



Ptot 



D - 1 



has a corresponding length vector I" such that N — ri{l^) 
and p{N) = minj Y^iPiVih - Imin) - <p(0) J2iPi- 

Proof: Any optimal length vector nodeset has p{rj{l)) = 
Ptot- Suppose is a solution to the Coin Collector's problem 
but is not a valid nodeset of a length vector. Then there exists 
an with / e [Imin + 2,Zmax] such that E N and 
(i, l-l) E I\N. Let R' ^ NU {{i, I - l)}- Then 

p(i?') = Ptot + (D - and, due to convexity, p{R') < 



p{N). Using n mod (13 — 1) = 1, we know that ptot is an 
integer multiple of D^''^"\ Thus, using Lemma [T] with k ~ 
Imin, X — ptot-C'""', and r ~ {D — 1)D^\ there exists an 
R C R' such that p{R) = r. Since p(i?) > 0, piR'\R) < 
p{R') < p{N). This is a contradiction to N being an optimal 
solution to the Coin Collector's problem, and thus any optimal 
solution of the Coin Collector's problem corresponds to an 
optimal length vector ■ 
Because the Coin Collector's problem is linear in time and 
space — same-width inputs are presorted by weight, numerical 
operations and comparisons are constant time — the overall 
algorithm finds an optimal code in 0{n{lmax — ^min)) time 
and space. Space complexity, however, can be decreased. 

VI. A Deterministic 0(n)-SPACE Algorithm 

If Pi — Pj, we are guaranteed no particular inequality 
relation between li and Ij since we did not specify a method 
for breaking ties. Thus the length vector returned by the 
algorithm need not have the property that li < Ij whenever 
i < j- We would like to have an algorithm that has such a 
monotonicity property. 

Definition 2: A monotonia nodeset, N, is one with the 
following properties: 



{i,l) E N 
{i,l) E N 



{i + l,l)EN 
{i,l-l)EN 



for i < n 

for I > Imin 



1. 



(6) 
(7) 



In other words, a nodeset is monotonic if and only if it corre- 
sponds to a length vector I with lengths sorted in increasing 
order; this definition is equivalent to that given in [30]. 

Examples of monotonic nodesets include the sets of nodes 
enclosed by dashed lines in Fig. |2] and Fig. |4] In the latter 
case, n = 21, £> = 3, Imin = 2, and Imax = 8, so ptot — 2/3. 
As indicated, if pi ~ pj for some i and j, then an optimal 
nodeset need not be monotonic. However, if all probabilities 
are distinct, the optimal nodeset is monotonic. 

Lemma 2: If p has no repeated values, then any optimal 
solution N = CC(/, n — 1) is monotonic. 

Proof: The second monotonic property (|7]i was proved 
for optimal nodesets in Theorem |2] The first property (|6]l can 
be shown via a simple exchange argument. Consider optimal 
I with i > j so that pi < pj, and also consider I' with lengths 
for inputs i and j interchanged, as in [8, pp. 97-98]. Then 

J2kPk'Pif'k - 'mill) - Y.kPk'fiih - 'mill) 
= {pj - Pi) [ifiik - /niin) - <p('j - 'mill)] 

< 

where the inequality is to due to the optimality of I. Since 
Pj — Pi > and (fi is monotonically increasing, k > Ij for 
all i > j and an optimal nodeset without repeated p must be 
monotonic. ■ 
Taking advantage of monotonicity in a Package-Merge 
coding implementation to trade off a constant factor of time 
for drastically reduced space complexity is done in [29] for 
length-limited binary codes. We extend this to the length- 
bounded problem, first for p without repeated values, then 
for arbitrary p. 



Note that the total width of items that are each less than 
or equal to width p is less than 2np. Thus, when we are 
processing items and packages of width p, fewer than 2n 
packages are kept in memory. The key idea in reducing space 
complexity is to keep only four attributes of each package in 
memory instead of the full contents. In this manner, we use 
0{n) space while retaining enough information to reconstruct 
the optimal nodeset in algorithmic postprocessing. 

Define 



7 A 

^mid 



(^max ^" ^min 1) 



For each package S, we retain only the following attributes: 

2) pC-S*) =E(»,OesP(«'0 

3) v{S) ^ |5n/„,id| 

4) VX"?) =X;(.,;)6Sn/M^(*'0 

where 7^ = {{i,l) \ I > /,„id} and /^id = {{ij) \ I = 'mid}- 
We also define I\o = {(i, Z) | I < 'mid}- 

With only these parameters, the "first run" of the algorithm 
takes 0{n) space. The output of this run is the package 
attributes of the optimal nodeset N. Thus, at the end of this 
first run, we know the value for ~ v{N), and we can 
consider N as the disjoint union of four sets, shown in Fig. |4l 

1) A = nodes in n I\o with indices in [1, ti — n^], 

2) B = nodes in n I\o with indices in [n — n,, + l,n], 

3) r = nodes in iV n /,„id, 

4) A = nodes in n /hi. 

Due to the monotonicity of N, it is clear that B = [n — + 
1, "^1 X [?min + 1, 'mid ~ 1] and r = [n - + 1, n] X {/mid}- 
Note then that p{B) = - D^-'-d )/(!)- 1) and 

p{r) = nj.D"'"'''. Thus we need merely to recompute which 
nodes are in A and in A. 

Because Zi is a subset of /hi, p{A) — tp{N) and p{A) = 
p{N)— p(B)— p{r)~ p{A). Given their respective widths, A is 
a minimal weight subset of [1 , 71 — n,y] x [Imin + 1 , 'mid — 1] and 
zi is a minimal weight subset of [n— n] x ['mid+1, 'max]- 
These are monotonic if the overall nodeset is monotonic. The 
nodes at each level of A and A can thus be found by recursive 
calls to the algorithm. This approach uses only 0{n) space 
while preserving time complexity; one run of an algorithm on 
?^('max — 'mill) nodcs is replaced with a series of runs, first 
one on n(/niax — 'min) nodes, then two on an average of at 
most ri(Zniax ~ 'min)/4 nodcs each, then four on an average of 
at most n(/max — 'min)/16, and so forth. An optimization of 
the same complexity is made in [30], where it is proven that 
this yields 0(n(/i„ax — 'min)) time complexity with a linear 
space requirement. Given the hard bounds for /max and /min, 
this is always 0{n^/D). 

The assumption of distinct p^'s puts an undesirable re- 
striction on our input that we now relax. In doing so, we 
make the algorithm deterministic, resolving ties that make 
certain minimization steps of the algorithm implementation 
dependent. This results in what in some sense is the "best" 
optimal code if multiple monotonic optimal codes exist. 



Recall that p is a nonincreasing vector Thus items of a 
given width are sorted for use in the Package-Merge algorithm; 
this order is used to break ties. For example, if we look 
at the problem in Fig. |2] — ip{5) — 6^, n — 7, D = 
3, /min = 1, /max = 4 — with probability vector p — 
(0.4, 0.3, 0.14, 0.06, 0.06, 0.02, 0.02), then nodes (7, 4), (6, 4), 
and (5,4) are the first to be grouped, the tie between (5,4) 
and (4, 4) broken by order Thus, at any step, all identical- 
width items in one package have adjacent indices. Recall that 
packages of items will be either in the final nodeset or absent 
from it as a whole. This scheme then prevents any of the 
nonmono tonicity that identical pi's might bring about. 

In order to assure that the algorithm is fully deterministic, 
the manner in which packages and single items are merged 
must also be taken into account. We choose to combine 
nonmerged items before merged items in the case of ties, in 
a similar manner to the two-queue bottom-merge method of 
Huffman coding [40], [44]. Thus, in our example, there is 
a point at which the node (2, 2) is chosen (to be merged 
with (3, 2) and (4, 2)) while the identical-weight package 
of items (5, 3), (6, 3), and (7, 3) is not. This leads to the 
optimal length vector I = (1,2,2,2,2,2,2), rather than I = 
(1,1,2,2,3,3,3) or I = (1,1,2,3,2,3,3), which ai-e also 
optimal. The corresponding nodeset is enclosed within the 
dashed line in Fig. |2] and the resulting monotonic code tree 
is the code tree shown in Fig. [T] 

This approach also enables us to set e, the value for dummy 
variables, equal to without violating monotonicity. As in 
bottom-merge Huffman coding, the code with the minimum 
reverse lexicographical order among optimal codes (and thus 
the one with minimum height) is the one produced; reverse 
lexicographical order is the lexicographical order of lengths 
after their being sorted largest to smallest. An identical result 
can be obtained by using the position of the "largest" node in a 
package (in terms of position number nl+i) in order to choose 
those with lower values, as in [32]. However, our approach, 
which can be shown to be equivalent via simple induction, 
eliminates the need for keeping track of the maximum value 
of nl + i for each package. 

VII. Further Refinements 

There are changes we can make to the algorithm that, for 
certain inputs, result in even better performance. For example, 
if /max ~ log_D then, rather than minimizing the weight of 
nodes of a certain total width, it is easier to maximize weight 
over a complementary total width and find the complementary 
set of nodes. Similarly, if most input symbols have one of 
a handful of probability values, one can consider this and 
simplify calculations. These and other similar optimizations 
have been done in the past for the special case (fi{S) = 6, 
/min = 0, L> = 2 [24], [33], [36], [42], [43], though we do 
not address or extend such improvements here. 

So far we have assumed that /max is the best upper bound 
on codeword length we could obtain. However, there are many 
cases in which we can narrow the range of codeword lengths, 
thus making the algorithm faster. For example, since, as stated 
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Fig. 4. The set of nodes /, an optimal nodeset A^, and disjoint subsets A, B, F, A 



previously, we can assume witiiout loss of generality that 
^max ^ \{n — 1)/{D — 1)], we can eliminate the bottom row 
of nodes from consideration in Fig. |2l 

Consider also when Z,nin = 0. An upper bound on can 
be derived from a theorem and a definition due to Larmore: 

Definition 3: Consider penalty functions (p and x- We say 
that X is flatter than (p if, for positive integers I' > I, (x(0 ^ 

x{i-i))w)-vii'-i)) < Mo~^a-i))(x(n-xa'-i)). 

[29]. 

A consequence of the Convex Hull Theorem of [29] is that, 
given X flatter than ip, for any p, there exist iy9-optimal i^'''^ 
and x-optimal Z'^-* such that l^'^^ is greater than l^-*^'' in terms 
of reverse lexicographical order. This explains why the word 
"flatter" is used. 

Penalties flatter than the linear penalty — i.e., convex ip — 
can therefore yield a useful upper bound, reducing complexity. 
Thus, if /mill = 0, we can use the results of a pre-algorithmic 
Huffman coding of the input symbols to find an upper bound 
on codeword length in linear time, one that might be better 
than /max- Alternatively, we can use the least probable input 
to find a looser upper bound, as in [4] . 

When Zniin > 1, one can still use a modified pre-algorithmic 
Huffman coding to find an upper bound as long as Lp{6) — 
6. This is done via a modification of the Huffman algorithm 
allowing an arbitrary minimum Zniin and a trivial maximum 
(e.g., /max = nor \{n-l)/{D-l)]): 

Procedure for length-lower-bounded ("truncated 
Huffman") coding 

1) Add {D — n) mod {D — 1) dummy items of probabil- 
ity 0. 

2) Combine the items with the D smallest probabilities 
Pi^ ,Pi2, ■ ■ ■ , Pi^ into one item with the combined prob- 
ability Pi = Y^f^iPii- This item has codeword 

to be determined later, while these D smallest items 
are assigned concatenations of this yet-to-be-determined 
codeword and every possible output symbol, that is, 
~ CiO, — c,;l, . . . , Cijj ~ Ci{D — 1). Since these 
have been assigned in terms of c^, replace the smallest 
D items with pi in p to form p. 

3) Repeat previous step, now with the remaining n— D + \ 
codewords and corresponding probabilities, until only 



£)'mi„ items are left. 
4) Assign all possible /niin long codewords to these items, 
thus defining the overall code based on the fixed-length 
code assigned to these combined items. 

This procedure is Huffman coding truncated midway 
through coding, the resulting trees serving as subtrees of 
nodes of identical depth. Excluding the last step, the algorithm 
is identical to that shown in [41] to result in an optimal 
Huffman forest. The optimality of the algorithm for length- 
lower-bounded coding is an immediate consequence of the 
optimality of the forest, as both have the same constraints 
and the same value to minimize. As with the usual Huffman 
algorithm, this can be made linear time given sorted inputs [44] 
and can be made to find a code with the minimum reverse 
lexicographical order among optimal codes via the bottom- 
merge variant. 

Clearly, this algorithm finds the optimal code for the length- 
bounded problem if the resulting code has no codeword longer 
than /max, whether this be because /max is trivial or because of 
other specifications of the problem. If this truncated Huffman 
algorithm fails, then we know that /„ = /max, that is, we 
cannot have that /„ < /max for the length-bounded code. This 
is an intuitive result, but one worth stating and proving, as it 
is used in the next section: 

Lemma 3: If a (truncated) Huffman code ((p{S) ~ 6) for 
/min has a codeword longer than some l^h, then there exists 
an optimal length-bounded code for bound [/min, hih] with 
codewords of length l^h- 

Proof: It suffices to show that, if an optimal code for 
the bound [/min, /max] has a codeword with length /max, then 
an optimal code for the bound [/min, /max — 1] has a codeword 
with length /max — 1, since this can be applied inductively 
from /max = In (assuming /„ is the length of the longest 
codeword of the truncated Huffman code) to l^h, obtaining 
the desired result. The optimal nodeset N for the bound 
[/min, /max] has width (n — _D'""" )/(Z) — 1). Therefore, 

in the course of the Package-Merge algorithm, we at one point 
have {n — £)'""")/(_D — 1) packages of width D^'-in which 
will eventually comprise optimal nodeset N, these packages 
having weight no larger than the remaining packages of the 
same width. 

Consider the nodeset N' formed by making each (i,/) 



in N into — 1). This nodeset is the solution to the 
Package-Merge algorithm for the total width £)^'""°+^(ri — 
£)'■"'")/(£) — 1) with bounds l^in — 1 and Zmax ~ 1- Let 
denote the number of nodes on level I. Then i(/miii) > 
j^_£)imin since at most nodes can have length /i„in. The 

subset of N' not of depth l^i^ — 1 is thus an optimal solution 
for bounds /„iin and Zniax ^ 1 with total width 

that is, at one point in the algorithm this solution corresponds 
to the _D(n— L''""")/(_D — 1) — i(/„iin) least weighted packages 
of width D^'min. Due to the bounds on i(Zmin), this number 
of packages is less than the number of packages of the same 
width in the optimal nodeset for bounds /,„in and /,„ax — 1 
(with total width 1)). Thus an optimal 

nodeset to the shortened problem can contain the (shifted- 
by-one) original nodeset and must have its maximum length 
achieved for all input symbols for which the original nodeset 
achieves maximum length. ■ 
Thus we can find whether /„ = /,„ax by merely doing pre- 
algorithmic bottom-merge Huffman coding (which, when ^ 
Imax, results in reduced computation). This is useful in finding 
a faster algorithm for large Zmax — 'min and linear ip. 

Vin. A Faster Algorithm for the Linear Penalty 

A somewhat different reduction, one analogous to the re- 
duction of [31], is applicable if (p{S) = S. This more specific 
algorithm has similar space complexity and strictly better time 
complexity unless Zmax — Imin = 0(log7i). However, we only 
sketch this approach here roughly compared to our previous 
explanation of the simpler, more general approach. 

Consider again the code tree representation, that using a 
£>-ary tree to represent the code. A codeword is represented 
by successive splits from the root to a leaf — one split for 
each output symbol — so that the length of a codeword is 
represented by the length of the path to its corresponding 
leaf. A vertex that is not a leaf is called an internal vertex; 
each internal vertex of the tree in Fig. [T]is shown as a black 
circle. We continue to use dummy variables to ensure that 
n mod [D — 1) = 1, and thus an optimal tree has k{1) = 1; 
equivalently, all internal vertices have D children. We also 
continue to assume without loss of generality that the output 
tree is monotonic. An optimal tree given the constraints of 
the problem will have no internal vertices at level Zmax, 
(n — £)'"")/(Z? — 1) internal vertices in the /,„ax — 'min 
previous levels, and (D'^in — 1)/(_D — 1) internal vertices — 
with no leaves — in the levels above this, if any. The solution 
to a linear length-bounded problem can be expressed by the 
number of internal vertices in the unknown levels, that is, by 



number of internal vertices 



in levels [1^ 



(8) 



so that we know that 



and 



If the truncated Huffman coding algorithm (as in the pre- 
vious section) fails to find a code with all li < ^max, then we 
are assured that there exists an li — Imax, so that a; can be 
assumed to be a sequence of strictly increasing integers. A 
strictly increasing sequence can be represented by a path on a 
different type of graph, a directed acyclic graph with vertices 
numbered to (n — Z)'-"" )/{D—l), e.g., the graph of vertices 
in Fig. |5] The ith edge of the path begins at and ends at 
ai, and each Ui represents the number of internal vertices at 
and below the corresponding level of the tree according to dHJ. 
Fig. [T] shows a code tree with corresponding a^'s as a count 
of internal vertices. The path length is identical to the height 
of the corresponding tree, and the path weight is 

imax-imin 



for edge weight function w, to be determined. Larmore and 
Przytycka used such a representation for binary codes [31]; 
here we use the generalized representation for _D-ary codes. 



P2 + P3 + P4 + P5 + + P7 




Fig. 5. The directed acyclic graph for coding n = 7, D = 3, linin = 1, 

'max = In = 4: {ip{5) = S) 

In order to make this representation correspond to the above 
problem, we need a way of making weighted path length 
correspond to coding penalty and a way of assuring a one-to- 
one correspondence between valid paths and valid monotonic 
code trees. First let us define the cumulative probabilities 



k—n~i+l 

SO that there are n+1 possible values for Si, each of which can 
be accessed in constant time after 0(n)-time preprocessing. 
We then use these values to weigh paths such that 

S(_Da"+-Q'+)i Da"^ — a'+ < n 
oo, Da"+ - «'+ > n 



w{a' , a") — 



where we recall that denotes max(x, 0) and oo is neces- 
sary for cases in which the numbers of internal vertices are 
incompatible; this rules out paths not corresponding to valid 
trees. Thus path length and penalty are equal, that is. 



E 

4 = 1 



D - 1 



This graph weighting has the concave Monge property or 
quadrangle inequality, 

w{a',a") + w{a' + 1, a" + 1) 
< w{a', a" + 1) + w{a' + \,a") 



for all < a' + 1 < a" < {n - - 1), since this 

inequality reduces to the already-assumed p„_i3q"+q,'+i_£) > 
Pn-Da"+a'+2 (whcrc ft = for 1 > Ti). Fig. |5] shows 
such a graph. A single-edge path corresponds to Z = 
(1, 2, 2, 2, 2, 2, 2) while the two-edge path corresponds to I ^ 
(1, 1, 2, 2, 3, 3, 3). In practice, only the latter would be under 
consideration using the algorithm in question, since the pre- 
algorithmic Huffman coding assured that /„ = Zmax = 3. 
Thus, if 

and 



we wish to find the minimum fc-link path from to (n — 
— 1) on this weighted graph of n' vertices. Given 
the concave Monge property, an n'2^'^^^°^ log "O-time 
0(n')-space algorithm for solving this problem is presented 
in [39 ]. Thus the problem in question can be solved in 
^20( Viog('max-;„.„) log log u) J jj^^g Q {u / D) spacc — 
0{n) space if one counts the pre-algorithmic Huffman coding 
and/or necessary reconstruction of the Huffman code or code- 
word lengths — an improvement on the Package-Merge-based 
approach except for k — O(logn). 

IX. Extensions 

One might wonder whether the time complexity of the 
aforementioned algorithms is the minimum achievable. Special 
cases (e.g., ^,nax ~ log/j n for (p{S) = 6, Imin — 0, and D ~ 2) 
can be addressed using modifications of the Package-Merge 
approach [24], [33], [36], [42], [43]. Also, p often impUes 
ranges of values, obtainable without coding, for li and /„. 
This enables one to use values of l^in and ^max that result in 
a significant improvement, as in [3] for Inun = 0. 

An important problem that can be solved with the tech- 
niques in this paper is that of finding an optimal code given 
an upper bound on fringe, the difference between minimum 
and maximum codeword length. One might, for example, wish 
to find a fringe-limited prefix code in order to have a near- 
optimal code that can be simply implemented, as in Section 
VIII of [25]. Such a problem is mentioned in [1, p. 121], 
where it is suggested that if there are 6 — 1 codes better than 
the best code having fringe at most d, one can find this 6-best 
code with the 0(fen'^)-time algorithm in [2, pp. 890-891], thus 
solving the fringe-limited problem. However, this presumes 
we know an upper bound for b before running this algorithm. 
More importantly, if a probability vector is far from uniform, 
b can be very large, since the number of viable code trees 
is 0(1.794. . .") [11], [28]. Thus this is a poor approach in 
general. 

Instead, we can use the aforementioned algorithms for find- 
ing the optimal length-bounded code with codeword lengths 
restricted to [I' — d, I'] for each I' G { [log^, n] , \\ogjj n] + 
1, . . . , [log^ nj + d}, keeping the best of these codes; this 
covers all feasible cases of fringe upper bounded by d. 
(Here we again assume, without loss of generality, that 



n mod (13 — 1) = 1.) The overall procedure thus has time 
complexity 0{nd^) for the general convex quasiarithmetic 
case and n(i20(^/'°s'"°s'°s") /D when applying the algorithm 
of Section IVIIII to the most common penalty of expected 
length; the latter approach is of lower complexity unless 
d — O(logn). Both algorithms operate with only 0{n) space 
complexity. 
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